-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A memory efficient implementation of the .mtx reading function #3389
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3389 +/- ##
==========================================
- Coverage 75.44% 75.41% -0.04%
==========================================
Files 113 113
Lines 13250 13266 +16
==========================================
+ Hits 9997 10005 +8
- Misses 3253 3261 +8
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! some small notes:
for more information, see https://pre-commit.ci
Please note that the selection of the chunk size of |
…tx reading function
…ing function (#3483) Co-authored-by: Gustavo Jeuken <[email protected]>
Pandas
read_csv
function is very memory intensive, and this makes loading data (especially large datasets from EBI Single Cell Expression Atlas) impossible on computers with 16gb of ram or less. The subsequent analysis of such datasets with scanpy, however, works well on such computers.Loading the data into chunks, using the same pandas function, solves this problem.